Regression Analysis with Linked Data
نویسنده
چکیده
Record linkage, or exact matching, can be used to join together two files that contain information on the same individuals, but lack unique personal identification codes. The possibility of errors in linkage causes problems for estimating the relationships between variables on the two files. The effect is analogous to the impact of measurement error. A model of a linear regression relationship between variables in linked files is proposed. Assuming the probabilities that pairs of records are links are known, an unbiased estimator of the regression coefficients is derived. Methods for estimating the linkage probabilities by using mixture models are discussed. A consistent estimator of the covariance matrix of the proposed estimator is proposed. A bootstrap estimator is used to reflect the impact of the uncertainty in record linkage model parameters on the estimators of the regression parameters. A simulation study compares the performance of the proposed estimator and alternatives.
منابع مشابه
Bayesian and Iterative Maximum Likelihood Estimation of the Coefficients in Logistic Regression Analysis with Linked Data
This paper considers logistic regression analysis with linked data. It is shown that, in logistic regression analysis with linked data, a finite mixture of Bernoulli distributions can be used for modeling the response variables. We proposed an iterative maximum likelihood estimator for the regression coefficients that takes the matching probabilities into account. Next, the Bayesian counterpart...
متن کاملThe Effect of Transitive Closure on the Calibration of Logistic Regression for Entity Resolution
This paper describes a series of experiments in using logistic regression machine learning as a method for entity resolution. From these experiments the authors concluded that when a supervised ML algorithm is trained to classify a pair of entity references as linked or not linked pair, the evaluation of the model’s performance should take into account the transitive closure of its pairwise lin...
متن کاملQuantitation of indirect sandwich enzyme-linked immunosorbent assay parameters.
The optimization of data from the indirect sandwich enzyme-linked immunosorbent assay has been commonly accomplished by linear regression analysis, even though the data are often essentially sigmoid. A new microcomputer software program (LISACRV) that uses a nonlinear regression statistical model to analyze the data from enzyme-linked immunosorbent assay titration experiments was developed.
متن کاملتجزیه ارتباطی برخی از پارامترهای پایداری در گندم نان با استفاده از نشانگرهای ISSR
Intersimple sequence repeat (ISSR) markers were evaluated in order to identify informative markers associated with drought tolerance indices in bread wheat (Triticum aestivum L.) genotypes. Eighteen ISSR primers amplified 92 loci among 20 bread wheat genotypes. Polymorphic information content (PIC) ranged from 0.46 (UBC-857, UBC-864, UBC-867, is9) to 0.21 (is7), with an average of 2.05. Stepwis...
متن کاملPhenotypic and Genetic Analysis of Lori-Bakhtiari Lamb's Weight at Different Ages for Autosomal and Sex-Linked Genetic Effects
The data set used in this study contained 8793 records of lamb's weight (kg) from 320 sires and 2349 dams collected during 1989 to 2014 from the Lori-Bakhtiari flock at Shooli station in Shahrekord, Iran. Non-genetic factors and genetic parameters (partitioned into autosomal, sex-linked and maternal) of lamb's weight at different ages were estimated using without and with sex-linked genetic eff...
متن کاملHurdle, Inflated Poisson and Inflated Negative Binomial Regression Models for Analysis of Count Data with Extra Zeros
In this paper, we propose Hurdle regression models for analysing count responses with extra zeros. A method of estimating maximum likelihood is used to estimate model parameters. The application of the proposed model is presented in insurance dataset. In this example, there are many numbers of claims equal to zero is considered that clarify the application of the model with a zero-inflat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004